Basic probability

Author
Affiliation

Dave Clark

Binghamton University

Published

January 8, 2025

Probability basics

Why Probability Distributions?

Inferential models depend on probability distributions:

  • estimation - is not deterministic, and so admits the unknown via disturbances \(\epsilon\). We characterize those unknowns as following some probability distribution.

  • inference - is essential because of the unknowns. Inference is possible because of the probability distribution characterizing the unknowns.

Probability Distributions

Probability distributions permit statements about relative scale, frequency, and uncertainty.

\(~\)

If the relation between \(x\) and \(y\) is measured by \(\widehat{\beta}\), we need to know whether or not to take \(\widehat{\beta}\) seriously - is it representative of the population relationship between \(x\) and \(y\)?

Things we want to know about \(x\)

  • \(Pr(X = x)\) - the probability of observing any particular value of \(x\); these together comprise the density.
  • \(Pr(X \leq x)\) - the probability of observing values up to and including \(x\) (or any range of \(x\)). (CDF)
  • central tendency of \(x\) - mean, median, mode.
  • dispersion - variance, or standard deviation (the average difference of any observation from the expected value).

Types of variables

Variables are either

  • discrete - observations match to integers; all possible values are clearly distinguishable; not divisible. E.g., number of protests in DC this year; an individual’s sex; Polity score.

  • continuous - observations can take on any real value between boundaries (sometimes \(-\infty, +\infty\)); infinitely divisible. E.g., household income, GDP per capita.

Discrete variables

May be of two types or levels of measurement:

  • nominal - categories are distinct, but lack order. E.g., religion = Hindu, Muslim, Catholic, Protestent, Jewish. Binary variables are nominal, e.g., Sex = male (0), female (1); do you have blue eyes? yes (0), no (1).

  • ordinal - take on countable values, increasing/decreasing in some dimension. E.g., Polity -10, -9, \(\ldots\) 0, 1, \(\ldots\) 9, 10 increasing in democracy; survey responses “Do you feel safe traveling abroad?” Not at all; sometimes; yes, completely.

Continuous variables

Can be of two types (levels of measurement):

  • interval - 1 unit increase has same meaning across the scale (i.e., the intervals are the same); e.g., degrees Celsius or Fahrenheit.

  • ratio - intervals but also has a meaningful absolute zero; e.g., weight in pounds; zero lbs indicates the absence of weight; Venmo balance = zero, means actually no money; degrees Kelvin. Duration of a war in days - zero days means there’s no war.

Levels of measurement

These four levels of measurement can be ordered by the amount of information a variable contains, least to most:

  • nominal

  • ordinal

  • interval

  • ratio

We can turn higher levels to lower levels, but not the opposite - doing so sacrifices information.

Levels of Measurement and Models

In general, the level of measurement of \(y\) (so the type and amount of information in a variable) shapes what type of model is appropriate.

  • discrete variables usually require statistics/models in the Binomial family (for our purposes, mostly MLE models like the Logit.)

  • continuous variables usually require statistics/models in the Normal/Gaussian family (for our purposes, mostly OLS models like the linear regression.)

Distributions

PDF and CDF

Probability distributions can be described by

  • Probability Density Function (PDF) which maps \(X\) onto the probability space describing the probabilities of every value of \(X\), \(x\).

  • Cumulative Distribution Function (CDF) which maps \(X\) onto the probability space describing the probability \(X\) is less than some value, \(x\).

PDF (Density)

Definition

The Probability Density Function or PDF describes the probability a random variable takes on a particular value, and it does so for all values of that random variable.


PDF (Density)

Example

Suppose we toss a fair coin 1 time and want to describe the probability distribution of the outcome, \(Y\) which is a random variable. The possible outcomes are heads and tails, and the probability of each is .5. This is the PDF because it describes the probabilities associated with all possible outcomes of \(Y\).


PDF (Density)

  • While establishing the probability of a value of a discrete variable is possible, establishing the probability of a particular value of a continuous variable is not.

  • Instead, the continuous PDF describes the instantaneous rate of change in \(Pr(X=x)\) for every value of \(x\).

PDF plots

CDF

Definition

For a random variable, \(Y\), the probability \(Y\) is less than or equal to some particular value, \(y\) in the range of \(Y\) defines the cumulative density function.

For a continuous variable:

\[P(Y \leq j) =\int\limits_{-\infty}^{j} f(X)dX\]

For a discrete variable:

\[P(Y \leq j) = \sum\limits_{y\leq j} P(Y=y) = 1- \sum\limits_{y> j} P(Y=y)\]

CDF plots

Notation

  • \(f(x)\) denotes the PDF of \(x\).

  • \(F(x)\) denotes the CDF of \(x\).

  • Substitute either the appropriate name, symbol, or function for \(F\) or \(f\) to indicate the distribution.

  • Lower case Greek letters denote PDFs: e.g. \(f(x)= \phi(x)\) denotes the normal PDF.

  • Upper case Greek letters denote CDFs: e.g. \(F(x)=\Phi(x)\) denotes the normal CDF.


Notation

Example

\(x\) is distributed Normal, \(N(\mu, \sigma^{2})\):

\[F(x) = \Phi_{\mu, \sigma^{2}}(x) = \int \phi_{\mu, \sigma^{2}} (x) f(x)d(x) \]

\[ f(x) = \phi_{\mu, \sigma^{2}}(x) =\frac{1}{\sigma \sqrt{2\pi}} \exp \left( - \frac{(x-\mu)^{2}}{2 \sigma^{2}} \right) \]

Note the first derivative of the CDF is the PDF; the integral of the PDF is the CDF.

Discrete Probability Distributions

Bernoulli

Suppose a binary variable \(x\) that takes on only the values of zero and one (\(x \in {0, 1}\)):

\[ Pr(X=1)=\pi\nonumber \\ Pr(x=0)= 1-Pr(x=1) \\ = 1-\pi \]

Bernoulli

The PDF is:

\[ f(x) = \\ \pi ~~~~~~~~ ~ ~ ~~~~ \text{if } x=1\\ 1-\pi ~ ~ ~ ~~~~\text{if } x=0 \]

or

\[f(x) = \pi^{x}(1-\pi)^{1-x} \]

Bernoulli

The CDF is:

\[F(x) =\sum_{x} f(x) \]

and the expected value is

\[ E(x) =\sum_{x}x f(x) \\ = (1)(\pi)+(0)(1-\pi) \\ = \pi \]

Binomial

Bernoulli is important because it is the foundation for a lot of other distributions, including the binomial distribution. The binomial describes the success probability function (where \(x=1\) is a “success”) from a set of \(n\) independent Bernoulli trials. So, the binomial is the probability of successes (“ones”) in \(n\) independent Bernoulli trials with identical probabilities, \(\pi\).

\(~\)

There are two essential parts to the binomial PDF - the probability of success, and the number of ways a success can occur.

Binomial

The probability of success is the Bernoulli probability:

\[ f(x) = \pi^{x}(1-\pi)^{1-x} \]

and the number of ways success can occur (called “n-tuples”) is

\[ \begin{pmatrix} n \\ x \end{pmatrix} = \frac{n!}{x!(n-x)!} \]

Notice the notation for the n-tuple.

Binomial

The PDF combines these:

\[ f(x)= \begin{pmatrix} n \\ x \end{pmatrix} \pi^{x}(1-\pi)^{n-x} \]

There are \(n\) trials, \(x\) is the number of successes, and \(n-x\) is the number of failures. Each event in the n-tuple arises with the Bernoulli probability.

Binomial

Example

Suppose we are going to toss a fair coin 4 times and want to know how many ways we can have heads come up twice.

\[ \frac{4!}{2!(4-2)!} = 6 \] Now, suppose we want to know the probability exactly 2 of those 4 tosses is heads.

\[ Pr(x=2) =\frac{4!}{2!(4-2)!} (.5)^{2}(.5)^{2} \\ = 6(0.0625) \\ = 0.375 \]

Binomial

Example

Here’s another example: suppose we flip 5 fair coins once each; what is the PDF of the number of heads we expect? In other words, what is the probability associated with each possible number of successes (heads), from 0 to 5?

\[ P(x=0) = \begin{pmatrix} 5 \\ 0 \end{pmatrix} \cdot (.5)^{0} \cdot (.5) ^{5} = 1 \cdot 1 \cdot 0.03125=0.03125\\ \] \[ P(x=1) = \begin{pmatrix} 5 \\ 1 \end{pmatrix} \cdot (.5)^{1}\cdot (.5)^{4} = 5 \cdot .5 \cdot 0.0625=0.15625 \\ \] \[P(x=2) = \begin{pmatrix} 5 \\ 2 \end{pmatrix} \cdot (.5)^{2}\cdot (.5)^{3} = 10 \cdot .25 \cdot 0.125=0.3125 \\ \] \[ \vdots \vdots \vdots \]

Binomial family distributions

  • the geometric distribution describes repeated Bernoulli trials with probability of success \(\pi\), until the first success.

  • the negative binomial describes the number of Bernoulli failures prior to the first success - it can be thought of as a counting process up to the first success.

  • the poisson describes Bernoulli trials where \(\pi\) for any particular trial is very small.

Continuous Probability Distributions

The Normal Distribution

The Normal distribution is the most widely used distribution in the social sciences.

  • The normal seems to “fit” a lot of the variables social scientists measure and use in models.

  • Estimation techniques we often employ assume the disturbance term is normally distributed; one result is the coefficients are either normally distributed or t-distributed.

Normal PDF

The normal PDF: \[ Pr(Y=y_{i})=\frac{1}{\sqrt{2 \pi \sigma^{2}}} exp \left[\frac{-(y_{i}-\mu_{i})^{2}}{2\sigma^{2}}\right] \]

where two parameters, \(\mu\) and \(\sigma^{2}\) describe the location and shape of the distribution, the mean and variance respectively; we indicate a normally distributed variable and its parameters as

\[ Y \sim \text{Normal}(\mu,\sigma^{2}) \nonumber \]


code
gap <- read.csv("/Users/dave/documents/teaching/501/2024/slides/L1-data/data/gapminder.csv")

ggplot(gap, aes(x=alcohol_consumption_per_adult_15plus_litres)) +
  geom_histogram(aes(y=..density..), colour="black", fill="white")+
 geom_density(alpha=.2, fill="#FF6666") +
   labs(x = "Liters of Booze", y= "Density", caption="Alcohol Consumption, Gapminder")+
  ggtitle("Density - Alcohol Consumption per Adult (Liters)") 

code
gap$nalc <- dnorm(gap$alcohol_consumption_per_adult_15plus_litres, mean=mean(gap$alcohol_consumption_per_adult_15plus_litres, na.rm = TRUE), sd=sd(gap$alcohol_consumption_per_adult_15plus_litres, na.rm = TRUE))

ggplot() + 
  geom_histogram(data=gap, aes(x=alcohol_consumption_per_adult_15plus_litres, y=..density..), colour="black", fill="white") +
  geom_density(data=gap, aes(x=alcohol_consumption_per_adult_15plus_litres), alpha=.2, fill="#FF6666")  +
 geom_line(data=gap, aes(x=alcohol_consumption_per_adult_15plus_litres, y=nalc), linetype="longdash", size=1)+
   labs(x = "Liters of Booze", y= "Density", caption="Alcohol Consumption, Gapminder")+
  ggtitle("Alcohol Consumption per Adult (Liters), Normal PDF")

Standard Normal

A useful special case of the Normal is the standard normal, \(z \sim \text{Normal}(0,1)\). The PDF for the standard normal is given by

\[ \phi(z)=\frac{1}{\sqrt{2 \pi }} exp \left[\frac{-z^{2}}{2}\right] \nonumber \]

where the parameters themselves drop out since we’d be subtracting a mean of zero and dividing/multiplying by a variance of 1. The standard normal CDF is denoted \(\Phi(z)\); the standard normal PDF is denoted \(\phi(z)\).

Normal PDFs with different moments

The following plot shows three normal PDFs with different means and variances. The PDFs are centered at -1, 0, and 1, and have variances of .5, 1, and 1.5 respectively. Thinking in terms of uncertainty (foreshadowing a bit), imagine that larger variance indicates more uncertainty about the mean, and smaller variance indicates less uncertainty.

Why Distributions Matter to models

Models

  • Probability models include unobserved disturbances; we make assumptions about the distributions of those errors, \(\epsilon\).

  • What we assume about the unobservables is always informed by what we know about the observables, mainly \(y\).

  • A useful way to describe or summarize \(y\) is to characterize its observed distribution.

  • The distribution of \(y\) informs our assumption about the distribution of \(\epsilon\).

Why this matters to OLS

Linear regression is such an inferential model; we have a number of sources of uncertainty. We represent that uncertainty in the model via the disturbance term, \(\epsilon\).

\(~\)

In order to know things about \(\epsilon\) and other parts of the model, we assume that the disturbances are normally distributed; \(y\) should be normal too.

Normality and Centrality

We rely on stuff being normally distributed, and central tendency being meaningful. The Central Limit Theorem facilitates this.

Central Limit Theorem

Suppose \(n\) random variables - call them \(X_1, X_2 \ldots X_n\). These variables are not identically distributed; some are discrete, some continuous. Each variable, \(X_i\) has mean \(\bar{X_k}\).

\[ {\sum\limits_{n \rightarrow \infty} (\bar{X_i})} / {n} \sim N (\mu, \sigma^2) \]

As \(n\) approaches infinity, the distribution of means, \(\bar{X_i}\) is distributed Normal, with mean \(\widetilde{X_i}\). We could accomplish the same thing by repeated sampling of a single variable.

Simulating the Central Limit Theorem

The following app simulates the Central Limit Theorem. You can select a distribution, sample size, and number of simulations. The app will plot the distribution of the means of the samples. You’ll see that, regardless of the distribution the means are drawn from, the distribution of the means is normal as the sample size increases.

CLT - Why is this valuable?

  • If we assume repeated sampling and/or infinitely large samples, we can assume normality.

  • We know the properties of the Normal intimately well. We can use this knowledge to evaluate where some value of \(x\) lies on the Normal CDF, what the probability less than that value is - we can figure out what the probability of observing that value is (on the PDF).

  • As we’ll see, \(\widehat{\beta}\) is normally distributed in the OLS model (it is in MLE as well). The CLT and what we know about normality facilitate inference.

Inference

Inference is our effort to measure and characterize our uncertainty about the model and its parameters.

  • uncertainty is the most important thing we estimate in inferential models.

  • characterizing uncertainty relies on probability theory.

Back to top